Skip to content

Upstream and dnst keyset TSIG support. (resolves #65)#564

Merged
ximon18 merged 124 commits intomainfrom
tsig-upstream-support
Apr 30, 2026
Merged

Upstream and dnst keyset TSIG support. (resolves #65)#564
ximon18 merged 124 commits intomainfrom
tsig-upstream-support

Conversation

@ximon18
Copy link
Copy Markdown
Member

@ximon18 ximon18 commented Apr 2, 2026

Summary

This PR implements two of three pieces of work needed for #301: upstream TSIG support and support for configuring dnst keyset to override the nameserver to test against when in auto mode. The third piece, downstream TSIG support, is implemented by PR #587.

Upstream TSIG support consists of TSIG signing XFR related requests sent to the upstream nameserver, accepting and verifying the TSIG key used when we receive a NOTIFY from the upstream nameserver, and adding support for the operator to configure Cascade to know the details of a TSIG key and to know which TSIG key to use with the upstream nameserver.

Note regarding downstream TSIG

Although downstream TSIG support has its own separate PR #587, some of the functionality is actually enabled by this PR. For example, by adding support in this PR for adding TSIG keys to the TSIG key store, that automatically makes those keys available to the TsigMiddlewareSvc which is already used by the zone server, it just had no keys to work with until now.

Known issues:

  • This PR doesn't completely deliver the above: notably it lacks verification that the right TSIG key was used to sign a received NOTIFY. It does verify that a known TSIG key was used, but cannot verify that only the right key was used rather than any key in our key store. This is due to a limitation of the underlying NotifyMiddlewareSvc from the domain crate which invokes a callback when a NOTIFY is received but fails to pass the TSIG key used to the callback defined in Cascade.
  • TSIG state isn't properly restored on startup so that the key used by a zone will after restart become orphaned, the zone source is not properly reconstituted and so will lack the TSIG key. This issue wasn't caused by this PR but was found while testing it. It is fixed in separate PR FIX: Restore zone TSIG key state on startup. #590.

Changes made

CLI changes:

Policy changes:

  • New key-manager.publication-nameservers policy setting that corresponds to the dnst keyset set publication-nameservers command.

Daemon changes:

  • New /tsig/ API endpoints to match the new CLI commands.
  • INFO level messages logged about the zone source previously only logged the IP address, but now that the TSIG key name part of the source may no longer be None that caused the log message to be quite verbose as it logs the key in debug format. This PR therefore adds a Display impl for the zone Source to avoid this.
  • In Center check that any TSIG keys referred to in policy are present on reload, and otherwise reject the policy.
  • In the loader don't discard the TSIG key name specified on a server source, instead use it to sign outgoing XFR related requests.
  • (De)serialize TSIG store key secrets as base64.
  • Use full hmac- prefixed names for TSIG algorithms.
  • Save added TSIG keys immediately so that if zone add is done quickly enough after tsig add that dnst keyset doesn't also try and read the TSIG store JSON file and find that the key is missing as it wasn't written to disk yet (due to the 5 second wait before dirty state is flushed to disk).

Documentation changes:

  • New cascade-tsig.rst man page for the new cascade tsig subcommand.
  • Updated cascade-zone.rst man page for the extension of the zone add --source argument syntax.
  • Renamed Integrations TOC entry to HSM Integrations and added a Nameserver Integrations TOC entry.
  • New Nameserver Integrations/NSD page showing how to use NSD as an upstream, optionally with TSIG.
  • New Zone Transfers page describing how Cascade communicates using zone transfers with upstreams and downstreams.

System test changes:

  • Update the Docker image to use newer dnst with support for the new dnst keyset set tsig-store-path and dnst keyset set publication-nameserver subcommands.
  • Extend the primary NSD configuration to include a TSIG tsig-key key definition and a new primary-tsig pattern that is modeled after the primary pattern but which requires the new tsig-key, and adds an example-tsig.test zone that uses it.
  • Adds an upstream-tsig system test which tests XFR against NSD using the new "TSIG required" zone and associated settings, firstly deliberately without the TSIG key configured, so zone loading fails, then with it configured such that zone loading succeeds.

This PR still requires:

  • A system test.
  • RustDocs.
  • Man page updates. Document clearly what the TSIG key will be used for when passed to cascade zone add, e.g. for NOTIFY in authentication? For TSIG request to upstream authentication? What about SOA queries sent to the upstream? Does the Cascade behaviour match the NSD behaviour and if not why not?
  • RtD manual updates. At a minimum document integration with NSD.
  • Discussion of the new CLI tsig add subcommand and the extension to the interpretation of the zone add --source argument.
  • Add a tsig list CLI command to list TSIG keys.
  • Add a tsig remove CLI command to remove TSIG keys. Removing an in-use TSIG key should presumably be disallowed. Perhaps there also needs to be a way to list the zones using a TSIG key, maybe cascade tsig show could show that?
  • Possibly also extend cascade zone status CLI command to show which TSIG key if any the zone is using with its source? Ignore this for now as it would collide with PR Status rework #567 and isn't strictly needed for this PR
  • Tidy up INFO cascaded::loader::zone: Setting source of zone 'nl' logging.

To think about:

  • What, for this PR, if anything, should happen if the TSIG store path is changed in the Cascade configuration?

  • If you are changing Rust code or integration tests (Cargo.*, crates/, etc/, integration-tests/, src/):

    • Did you run the integration tests with act through the act-wrapper (as described in TESTING.md)?
  • If you are adding/deleting man pages:

    • Did you update the man_pages config in doc/manual/source/conf.py?
    • Did you update the packaged man pages in the Cargo.toml?
    • Did you commit the freshly built man pages?
  • If you are modifying man pages:

    • Did you commit the updated built man pages?

- New `cascade tsig add` client command to request that the daemon add a
TSIG key to the Cascade TSIG store.
- New cascaded POST /tsig/ API to add a TSIG key to the Cascade TSIG
store.
- Extension of the `cascade zone add` command `--source` argument with
an optional `!<TSIG key name>` syntax to define the TSIG key that
Cascade should use when sending an XFR request to the upstream.
- Pass the key defined by the source to the zone loader instead of None
(the zone loader is already capable of using the key, it just wasn't
being told which key to use)
@ximon18 ximon18 added this to the 0.1.0-beta1 milestone Apr 2, 2026
@ximon18 ximon18 added the enhancement New feature or request label Apr 2, 2026
@ximon18 ximon18 changed the title Upstream TSIG support. Upstream TSIG support. (resolves #65) Apr 2, 2026
@ximon18 ximon18 linked an issue Apr 13, 2026 that may be closed by this pull request
Comment on lines +41 to +42
Incoming DNS messages that are TSIG signed will be rejected if the key used
to sign the message is not registered with Cascade.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is accurate, added keys are relevant for all zones because once a key is added to the global Cascade store, any incoming DNS message (whether from upstream, e.g. NOTIFY, or from downstream, e.g. AXFR or SOA) will be handled by the TsigMiddlewareSvc which, even if a zone is not configured to use TSIG, will still reject the incoming message if that message uses a TSIG key which is not in the global Cascade TSIG store.

The description may be accurate, but the behavior sounds far from ideal. It sounds like the TSIG middleware service activates once the TSIG key store is non-empty; but this means TSIG config for one zone affects others, which is IMO quite surprising. Cascade should only allow TSIG keys that are relevant for the zone being queried; it's okay to document the real implemented behavior, but I think this description should also mention the direction we want to move to. We should create a GH issue for reaching the desired TSIG behavior and we could then link to it here.

This could be worked around by adding a custom middleware service layer impl between TsigMiddlewareSvc and NotifyMiddlewareSvc that does the "correct key" and "correct no key" checks.

Perhaps we could just add a check for this in the new ZoneService type. It has access to all the right information.

--------

https://cascade.docs.nlnetlabs.nl
Cascade online documentation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Cascade online documentation" doesn't make sense grammatically...


.. option:: publication-nameservers = []

A set of nameservers to use when checking for rrsiG propagation during a
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:RFC:`4648` Base64 encoded secret key material. The number of bytes prior
to encoding must be correct for the specified ``<ALGORITHM>``.

Can also be a path to a file containing the Base64 encoded secret material.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be ambiguous, e.g. is foo/bar (or something like it) could be valid Base64 as well as a filesystem path. One way to make this unambiguous is to require paths to start with / or ./.

addr: SocketAddr,

/// The name of a TSIG key, if any.
tsig_key: Option<String>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not store Option<Name<Bytes>> here, and eliminate the fallibility of converting into cascade_api::ZoneSource?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domain::base::Name isn't available here, only cascade_api types. The latter currently exposes TsigKeyName which is defined as domain::base::Name<octseq::Array<255>> which if used here causes Clippy to contain about an overly large enum variant. How would you suggest to proceed here?

Comment thread doc/manual/source/nsd.rst Outdated
Comment thread doc/manual/source/zone-transfers.rst Outdated
Comment on lines +4 to +7
Cascade is designed to be deployed between a hidden upstream nameserver and
public downstream nameservers. The hidden upstream serves the unsigned zone,
Cascade signs it and passes it to the downstream nameservers for publication
to consumers.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure... but we intend to support that. The original paragraph here says "Cascade is designed to be" -- i.e. its intention and not necessarily the reality. I think zonefile-based pipelines are part of that design, even if they are not fully supported yet.

Comment on lines +84 to +99
Controlling automatic key rollover zone transfer settings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When using automatic key rollover (the default) Cascade will attempt to verify
that certain key properties of the signed zone being served to consumers are
correct.

This verification is done by transferring the zone and inspecting it. By
default transfer is attempted from the nameserver identified by the MNAME
field of the apex SOA record in the zone.

If an alternate nameserver should be queried instead of the MNAME
nameserver, or if a specific port number or TSIG key should be used
to request the transfer, you will also need to configure the Cascade
key manager to fetch the zone correctly. This can be done via the
``key-manager.publication-nameservers`` policy setting.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in this document as it relates to zone transfers.

At first glance, it appears that this document details how to use Cascade in a zone transfer based pipeline, with regards to its input and output. But the key manager's querying of public nameservers feels distinct from that. Sure, both relate to the underlying mechanism of XFRs, but that is why I don't understand the purpose of this document -- is its purpose "here is how to use Cascade in a zone transfer based pipeline" or "here is every zone transfer related consideration for using Cascade"?

Comment thread src/tsig/mod.rs
Comment on lines -203 to +210
state.tsig_store.mark_dirty(center);
drop(state);
save_now(center);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Please add a comment.

Comment thread src/units/key_manager.rs Outdated
@ximon18
Copy link
Copy Markdown
Member Author

ximon18 commented Apr 30, 2026

Following an internal discussion we have agreed to merge this PR as-is. I have extracted a few notable items to separate GH issues so that we don't lose track of them:

@ximon18 ximon18 merged commit 694a941 into main Apr 30, 2026
9 checks passed
@ximon18 ximon18 deleted the tsig-upstream-support branch April 30, 2026 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zone loader uses a hard-coded TSIG key

4 participants